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!•  Introduction 


Seismic  signals  observed  at  regional  distances  are  generally  quite  complicated  and  often  exhibit 
dramatic  dependence  on  regional  variations.  Hence,  region-specific  information  regarding 
regional  seismic  discriminants  is  vital  in  distinguishing  nuclear  explosions  from  other  events. 
Unfortunately,  relevant  ground-truth  data,  particularly  for  underground  nuclear  explosions,  do  not 
exist  for  most  regions.  Also,  it  has  yet  to  be  shown  that  a  discrimination  threshold,  established  in  a 
region  for  which  data  exist,  can  be  transported  effectively  to  a  new  region.  Thus,  in  most  cases, 
screening  of  regional  seismic  events,  within  the  context  of  monitoring  the  Comprehensive 
Nuclear  Test-Ban  Treaty  (CTBT),  is  a  problem  of  identifying  unusual  events  (i.e.,  outliers) 
relative  to  routine  seismic  activity  in  their  respective  regions. 

Fisk  et  al.  (1993,  1994,  1995,  1996b)  describe  the  outlier  (or  regional  population)  analysis  and 
provide  numerous  results  of  applications  to  regional  seismic  data  sets  for  diverse  geological 
regions,  for  a  wide  range  of  epicentral  distances  and  magnitudes,  and  for  single  stations  and 
arrays.  Software  to  perform  the  regional  population  analysis  has  been  installed  at  the  Prototype 
International  Data  Centre  (PIDC)  in  Arlington,  VA  (e.g.,  Fisk  et  al.,  1996a).  For  operational  use  in 
screening  regional  seismic  events,  regional  training  sets  must  be  established  for  comparison.  As 
Fisk  et  al.  (1994,  1995,  1996b)  and  others  have  noted,  regional  discriminants  must  be  corrected 
for  distance-  and  frequency-dependent  attenuation  in  order  to  obtain  valid  results.  Furthermore, 
evidence  suggests  that  significant  tectonic  variations  may  require  subregional  training  sets  and 
distance  corrections. 

In  this  report,  we  describe  our  initial  efforts  to  establish  regional  event-characterization  training  sets 
and  distance  corrections  for  35  Primary  and  51  Auxiliary  seismic  stations  of  the  International 
Monitoring  System  (IMS)  network.  These  IMS  stations  were  providing  regular  data  to  the  PIDC  as 
of  January  1997.  Section  2  provides  some  technical  background  on  the  outlier  approach  and 
empirical  distance  corrections.  Section  3  describes  the  IMS  seismic  data  used  in  this  work.  In 
Section  4  we  compute  distance  corrections  for  Pn/Lg  and  Pn/Sn  in  several  frequency  bands,  based 
on  the  regional  training  sets  for  each  IMS  station  with  sufficient  data.  Section  5  discusses  the 
discriminant  distributions  and  outlier  removal,  in  order  to  iteratively  refine  the  distance  corrections. 
In  Section  6  we  categorize  each  station  in  terms  of  the  status  of  the  initial  training  sets  and  distance 
corrections  for  experimental  operational  use  at  the  PIDC.  While  there  are  many  stations  with 
adequate  data  over  a  relatively  broad  range  of  regional  distances,  for  which  reasonable  initial 
distance  corrections  can  be  established,  there  are  many  stations  which  require  further  data  and 
subregional  analyses.  Section  7  provides  some  conclusions  and  recommendations  as  to  the  status 
and  future  direction  of  our  regionalization  efforts. 
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2.  Overview  of  Technical  Approach 

To  assess  whether  an  event  is  an  outlier  relative  to  the  event  population  in  its  respective  region,  a 
comparison  is  made  to  a  training  set.  A  training  set  consists  of  a  sample  of  regional  discriminants 
for  events  in  the  same  region  (defined  for  now  to  be  within  20  degrees  from  a  given  station)  as  the 
event  to  be  tested.  Ideally,  a  training  set  would  consist  of  events  of  known  type,  with  ground  truth, 
which  have  the  same  location  and  magnitude  as  the  event  being  tested.  In  reality,  however,  such 
events  are  not  available  in  sufficient  numbers  to  afford  this  luxury.  Therefore,  events  with  different 
locations  and  magnitudes  must  be  used  in  the  same  training  set,  leading  to  the  necessity  of 
“correcting”  the  measured  values  of  the  discriminants  so  the  training  events  can  be  interpreted  as 
coming  from  the  “same”  population  as  the  event  being  tested. 

In  most  practical  situations  we  will  not  know  the  event  types,  a  priori.  A  fundamental  assumption 
is  made  that  the  number  of  new  nuclear  tests  in  a  region  will  be  relatively  small  compared  to  the 
number  of  other  types  of  events  in  the  region.  Also,  since  there  only  are  a  relatively  small  number 
of  mining  blasts  that  occur  above  mb  3.5,  due  to  typical  mining  practices,  the  majority  of  regional 
events  above  this  magnitude  level  consist  of  earthquakes.  Hence,  the  primary  concern  for 
monitoring  above  this  level  is  to  distinguish  potential  nuclear  explosions  from  respective  regional 
earthquake  populations.  We  focus  on  this  case  here.  (Fisk  et  al.,  1993,  1994,  also  considered  cases 
in  which  earthquake  training  sets  were  contaminated  by  large  quarry  blasts  or  rock  bursts.) 

Meaningful  application  of  the  outlier  analysis  requires  a  second  fundamental  assumption,  i.e.,  that 
there  is  at  least  one  regional  discriminant  that,  when  appropriately  corrected  for  relevant 
geophysical  effects  (e.g.,  distance  dependence),  provides  some  distinguishing  measure  of 
eeuthquakes  and  simple  explosions.  It  is  not  necessary  to  know,  a  priori,  what  the  quantitative 
value  of  the  separation  is,  only  that  the  discriminant(s)  will  provide  some  separation.  In  other 
words,  the  outlier  analysis  cannot  provide  meaningful  results  in  the  absence  of  useful 
discriminants,  but  does  provide  a  robust  methodology  to  treat  multivariate  data  for  the  purpose  of 
quantifying  anomalous  events,  including  nuclear  explosions,  if  such  discriminants  exists. 

The  regional  event  characterization  parameters  considered  here  consist  of  ratios  of  maximum 
phase  amplitude  measurements,  Pn/Lg  and  Pn/Sn  in  the  2-4  Hz,  4—6  Hz,  and  6-8  Hz  bands.  In 
many  cases,  the  events  have  missing  data;  one  or  more  of  the  discriminants  may  be  missing  due  to 
blockage  or  strong  attenuation  of  a  particular  seisnuc  phase  or  poor  signal-to-noise,  for  example. 
It  is  also  possible  for  a  seismic  event  under  consideration  to  have  regional  measurements  at  more 
than  one  station.  In  such  a  case  it  is  possible  to  use  events  with  regional  discriminants  measured  at 
all  of  these  stations  to  compose  the  training  set.  The  multi-station  case  will  be  considered  in  the 
future.  Here  we  restrict  our  study  to  training  sets  with  regional  measurements  at  one  station  only. 
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In  practice,  then,  there  is  one  training  set  for  each  station  in  the  network.  Events  with  at  least  one 
measurement  of  the  six  discriminants  listed  above  are  candidates  for  the  training  sets.  To  prevent 
possible  contamination  by  mining  blasts,  only  events  above  mb  3.5  are  considered.  Only  those  of 
the  six  discriminants  with  SNR  greater  than  1.5  are  included  in  a  training  set.  As  more  data 
become  available,  a  more  restrictive  SNR  cut-off  may  be  used. 

An  outlier  can  be  thought  of  as  a  measurement  that  is  inconsistent  with  the  measurements  of  some 
set  of  data,  in  our  case,  the  training  set.  It  is  useful  to  hypothesize  that  the  data  set  consists  of 
random  samples  from  some  (unknown)  distribution.  Typically,  the  distribution  is  of  the 
continuous  type,  with  all  (in  our  case,  positive)  values  possible  as  an  outcome  of  a  measurement. 
If  this  is  the  case,  then  a  candidate  outlier,  whether  a  random  sample  from  a  different  distribution 
or  the  same  distribution  of  the  training  set,  will  have  values  which  are  possible  values  of  the 
training  set  distribution.  In  this  case,  the  outlier  must  be  defined  as  a  point  (in  the  six-dimensional 
space  of  discriminant  values)  that  is  “far”  from  the  center  of  the  training  set  distribution,  i.e.,  in 
regions  of  low  probability.  In  the  univariate  case,  “far”  means  either  a  large  or  small  discriminant 
value  relative  to  the  mean,  as  compared  to  the  variance.  In  the  multivariate  case,  low  probability 
regions  are  much  more  complicated,  involving  the  correlations  of  the  discriminants. 

Assuming  normality  of  the  discriminant  distributions  and  using  a  generalized  likelihood  ratio  test, 
Fisk  et  al.  (1993,  1994,  1995,  1996b)  have  shown  how  specifying  low  probability  regions  can  be 
quantified.  As  explained  in  more  detail  by  Fisk  et  al.  (1996b),  an  outlier  is  defined  as  a  set  of 
discriminant  values  which  yield  a  value  of  the  generalized  likelihood  ratio,  X ,  that  is  less  than  a 
threshold,  The  threshold,  is  set  such  that  P[A.<X„|Ho]  =  a,  when  it  is  true  that  the 
event  being  tested  actually  belongs  to  the  event  population  (i.e.,  when  the  null  hypothesis,  Hq,  is 
true),  where  a  is  the  significance  level  of  the  test.  That  is,  if  the  six  values  were  a  sample  from  the 
training  set  distribution,  then  only  100a  %  of  the  time  is  the  value  X  less  than  X,„.  Events  with 
likelihood  ratios  less  than  the  threshold  are  considered  outliers  at  the  specified  significance  level. 
We  typically  set  a  =  0.01 .  The  statistic,  A,,  combines  multivariate  discriminant  data  for  the  event 
being  tested  and  the  training  events  into  a  univariate  expression.  It  provides  a  useful  metric  with 
which  to  perform  a  hypothesis  test  or  to  rank  events.  The  latter  provides  an  alternative  to  imposing 
a  rigid  “yes/no”  judgement  and  a  means  to  focus  on  the  most  anomalous  events. 

In  order  for  the  outlier  test  to  provide  meaningful  results,  each  event  in  the  training  set  should  be  a 
sample  from  the  same  distribution.  However,  the  events  that  make  up  a  given  training  set  will  have 
occurred  at  various  locations,  in  different  tectonic  subregions,  and  have  propagation  paths  with 
different  geophysical  effects  on  the  observed  regional  signals.  In  particular,  events  will  have 
different  distances  from  the  station.  Since  the  various  phases  (Pn,  Lg,  Sn)  exhibit  different  rates  of 
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attenuation,  the  ratios  Pn/Lg  and  Pn/Sn  vary  with  epicentral  distance.  From  a  statistical 
perspective,  these  effects  can  lead  to  events  in  a  given  training  set  being  described  by  possibly 
different  distributions,  with  different  means  and  covariance  matrices,  than  other  events  in  the 
training  set  at  different  locations.  From  a  geophysical  or  monitoring  perspective,  these  effects,  if 
left  untreated,  can  lead  to  inaccurate  screening  or  identification  results,  i.e.,  potential  missed 
violations.  Fisk  et  al.  (1994,  1996b)  discuss  cases,  including  a  Lop  Nor  nuclear  explosion,  in 
which  events  would  be  misidentified  if  distance  corrections  were  not  applied  to  the  discriminants. 

It  is  also  possible  that  the  discriminant  distributions  depend  on  magnitude.  However,  evidence 
suggests  that  linearity  is  a  good  approximation,  so  that  the  Pn/Lg  and  Pn/Sn  discriminants  do  not 
exhibit  significant  dependence  on  magnitude.  Thus,  it  will  be  assumed  in  this  study  that  the  effect 
of  magnitude  on  the  distribution  of  regional  amplitude  ratios  is  negligible  relative  to  distance  and 
other  path  effects. 

Here  we  focus  on  the  effect  of  epicentral  distance  on  regional  discriminants  and  leave  other 
considerations  for  future  studies.  In  a  relatively  uniform  region,  Pn/Lg  and  Pn/Sn  will  not  depend 
significantly  on  event-to-station  azimuth.  Thus,  the  simplest  corrections  are  those  which  depend 
only  on  distance.  If  this  function  of  distance  were  known,  then  it  would  be  a  simple  matter  to 
correct  each  discriminant  so  that  the  corrected  training  set  would  contain  events  which  were  all 
samples  from  the  same  distribution.  Since  this  function  of  distance  is  not  known,  a  priori,  it  must 
be  estimated  from  the  data.  It  has  been  suggested  (e.g.,  Sereno,  1990)  that  a  function  of  the  form 

Pn/Lg(/)  =  P(/)(A/Ao)“^'^\  (1) 

approximates  the  distance  dependence,  where  A  is  the  epicentral  distance.  Similar  dependence  is 
assumed  for  Pn/Sn,  but  with  different  coefficients.  The  unknown  coefficients,  oc  and  P ,  which 
depend  on  the  frequency  band,  /,  can  be  estimated  using  a  least-squares  procedure,  and  the 
constant  Aq  is  an  arbitrary  reference  distance.  If  the  logarithm  is  taken  of  both  sides  of  Eq.  (1),  the 
resulting  equation  is  linear  in  logA.  If  Eq.  (1)  is  valid,  then  a  data  plot  of  log(Pn/Lg)  versus 
log  A  would  approximate  a  straight  line  with  slope  a.  Once  a  has  been  estimated,  the 
discriminant  is  corrected  using  the  equation 

where  a  is  the  least-squares  estimate  of  a .  In  Section  4,  this  formula  will  be  used  to  compute 
regional  distance  corrections  at  each  of  the  Primary  and  Auxiliary  seismic  stations  for  which  there 
are  sufficient  data,  based  on  the  IMS  data  sets  described  in  the  following  section. 
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3.  IMS  Regional  Seismic  Data 

At  the  time  of  this  study,  January  1997,  there  were  35  Primary  and  51  Auxiliary  seismic  stations 
in  the  IMS  network;  these  IMS  stations  are  considered  in  this  report.  Note,  however,  that  this  list 
changes  periodically.  The  14  arrays  and  21  three-component  (3-C)  stations  of  the  Primary 
network  are  listed  in  Table  1  and  the  4  arrays  and  32  three-component  (3-C)  stations  of  the 
Auxiliary  network  are  listed  in  Table  2.  The  numbers  for  the  Auxiliary  network  exclude  15 
stations  (indicated  by  a  which  are  accessed  by  modem  and  are  not  presently  used  by  the 
IDC,  except  under  special  circumstances,  and  for  which  there  are  currently  no  data  available  in 
the  PIDC  archive  database  (IDC  Performance  Report,  28  January  1997). 

Figure  1  shows  the  number  of  regional  events  for  each  Primary  station  between  10  September 
1995  (950910)  and  15  January  1997  (970115).  A  regional  event  is  defined  as  one  within  20 
degrees  of  the  station  with  at  least  one  of  the  six  discriminants  (Pn/Lg  and  Pn/Sn  in  the  2—4,  4—6, 
and  6-8  Hz  bands)  measured  with  SNR  greater  than  1.5  (depicted  by  the  hashed  bars).  Also 
indicated  in  the  plot  by  the  solid  bars  are  those  events  with  at  least  one  measurement  with  SNR 
greater  than  2.0.  Figure  2  is  a  similar  plot  for  the  Auxiliary  stations.  In  the  16  months  represented 
in  these  plots  it  can  be  seen  that  the  station  with  the  most  events,  WRA,  detects  about  30  events 
per  month  with  SNR  >  1.5,  while  there  are  some  stations  with  less  than  one  event  per  month  on 
average.  The  fraction  of  events  with  SNR  >  1.5,  which  also  have  SNR  >  2.0,  is  about  60%. 

Figure  3  shows  the  locations  of  all  on-line  Primary  and  Auxiliary  stations.  The  stations  depicted 
by  green  squares  have  at  least  20  regional  events  with  at  least  one  discriminant  with  SNR  >  2.0. 
Stations  depicted  by  blue  triangles  have  at  least  20  regional  events  with  SNR  >1.5  but  not  20  or 
more  events  with  SNR  >  2.0.  Stations  that  have  less  than  20  regional  events  are  depicted  by  red 
circles.  There  are  32  stations  that  are  green  and  another  19  that  are  blue,  leaving  35  of  the  86 
stations  with  less  than  20  regional  events  in  this  16-month  period  (depicted  by  the  red  circles). 

Figure  4  is  a  histogram  of  the  number  of  regional  events  above  mb  3.5  as  a  function  of  the  number 
of  discriminants  measured  (there  can  be  one  through  six  discriminants  measured).  Figure  5  is  a 
similar  plot  giving  the  percentage  of  the  total  of  2797  regional  events  in  the  16-month  period  with 
one  through  six  discriminants  measured.  As  seen  in  the  plots,  most  events  have  only  one,  two  or 
three  discriminants  measured.  Less  than  5%  of  the  events  have  measurements  with  SNR  >1.5  for 
all  six  discriminants,  meaning  that  more  than  95%  of  regional  events  have  some  missing  data.  It  is 
important  to  systematically  determine  the  physical  effects  (e.g.,  blockage,  attenuation,  etc.) 
leading  to  this.  This  is  an  involved  and  complicated  issue  for  future  study. 
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Table  1.  Primary  seismic  stations. 


Code 

Latitude 

Longitude 

station  Name,  Location 

Type 

ABKT 

37.9304 

58.1189 

Alibek,  Turkmenistan 

3C 

ARCES 

69.5349 

25.5058 

ARCESS  Array,  Norway 

Array 

ASAR 

-23.6664 

133.9044 

Alice  Springs  Array,  Australia 

Array 

BDFB 

-15.6440 

-48.0141 

Brasilia,  Brazil 

3C 

BGCA 

5.1761 

18.4242 

Bogoin,  Central  African  Republic 

3C 

BJT 

40.0183 

116.1679 

Baijiatuan,  China 

3C 

BOSA 

-28.6137 

25.2559 

Boshof,  South  Africa 

3C 

CMAR 

18.4575 

98.9429 

Chiang  Mai  Array,  Thailand 

Array 

CPUP 

-26.3306 

-57.3292 

Villa  Florida,  Paraguay 

3C 

DBIC 

6.6701 

-4.8563 

Dimbroko,  Ivory  Coast 

3C 

ESDC 

39.6755 

-3.9617 

Sonseca  Array,  Spain 

Array 

FINES 

61.4436 

26.0771 

FINESS  Array,  Finland 

Array 

GERES 

48.8451 

13.7016 

GERESS  Array,  Germany 

Array 

HIA 

49.2667 

119.7417 

Hailar,  China 

3C 

ILAR 

64.7714 

-146.8866 

Eielson  Array,  Alaska 

Array 

KBZ 

43.7286 

42.8975 

Khabaz,  Russia 

3C 

KSAR 

37.4421 

127.8844 

Wonju  Array,  South  Korea 

Array 

LOR 

47.2683 

3.8589 

Lormes,  France 

3C 

LPAZ 

-16.2879 

-68.1307 

La  Paz,  Bolivia 

3C 

MAW 

-67.6039 

62.8706 

Mawson,  Antarctica 

3C 

MJAR 

36.5427 

138.2070 

Matsushiro  Array,  Japan 

Array 

MNV 

38.4328 

-118.1531 

Mina,  Nevada 

3C 

NORES 

60.7353 

11.5414 

NORESS  Array,  Norway 

Array 

NRI 

69.0061 

87.9964 

Norilsk,  Russia 

3C 

PDAR 

42.7667 

-109.5579 

Pinedale  Array,  Wyoming 

Array 

PDY 

59.6333 

112.7003 

Peleduy,  Russia 

3C 

PLCA 

-40.7306 

-70.5500 

Paso  Flores,  Argentina 

3C 

SCHQ 

54.8319 

-66.8336 

Schefferville,  Canada 

3C 

STKA 

-31.8769 

141.5952 

Stephens  Creek,  Australia 

3C 

TXAR 

29.3338 

-103.6670 

TXAR  Array,  Texas 

Array 

ULM 

50.2486 

-95.8755 

Lac  du  Bonnet,  Canada 

3C 

VNDA 

-77.5139 

161.8456 

Vanda,  Antarctica 

3C 

WRA 

-19.9426 

134.3394 

Warramunga  Array,  Australia 

Array 

YKA 

62.4932 

-114.6053 

Yellowknife  Array,  Canada 

Array 

ZAL 

53.6167 

84.7917 

Zalesovo,  Russia 

3C 

#  of  Elements 
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Table  2.  Auxiliary  seismic  stations. 


Code 

Latitude 

Longitude 

Station  Name,  Location 

-13.9093 

-171.7773 

Afiamalu,  Western  Samoa 

34.9425 

-106.4575 

Albuquerque,  New  Mexico 

AQU# 

42.3540 

L’Aquila,  Italy 

ARU 

56.4302 

58.5625 

Arti,  Russia 

BBB 

52.1847 

-128.1133 

Bella  Bella,  Canada 

BORG 

64.7474 

-21.3268 

Borgarfjordur,  Iceland 

CTA 

-20.0885 

146.2540 

Charters  Towers,  Australia 

DAV# 

7.0878 

125.5747 

Davao,  Philippines 

DAVOS 

46.8394 

9.7943 

Davos,  Switzerland 

DLBC 

58.4372 

-130.0272 

Dease  Lake,  Canada 

EKA 

55.3332 

-3.1588 

Eskdalemuir  Array,  Scotland 

ELK 

40.7448 

-115.2388 

Elko,  Nevada 

FITZ 

-18,1030 

125.6430 

Fitzroy  Crossing,  Australia 

FRB 

63.7467 

Iqaluit,  Canada 

HFS 

13.6968 

Hagfors  Array,  Sweden 

HNR# 

-9.4322 

159.9471 

Honiara,  Solomon  Islands 

INK 

68.3067 

-133.5200 

Inuvik,  Canada 

ISG 

24.3800 

124.2300 

Ishigaki-jima,  Japan 

JER 

31.7719 

35.1972 

Jerusalem,  Israel 

JTS 

10.2908 

Las  Juntas  de  Abangares,  Costa  Rica 

KIEV# 

50.6944 

29.2083 

Kiev,  Ukraine 

KKJ 

41.7800 

140.1800 

Kaminokuni,  Japan 

KVAR 

43.9557 

42.6952 

Kislovodsk  Array,  Russia 

LSZ# 

28.1882 

Lusaka,  Zambia 

MBC 

76.2420 

-119.3600 

Mould  Bay,  Canada 

MLR 

45.4917 

25.9437 

Muntele  Rosu,  Romania 

MSEY 

-4.6737 

Mahe,  Seychelles 

NEW 

48.2633 

-117.1200 

Newport,  Washington 

NIL 

33,6500 

73.2512 

Nilore,  Pakistan 

NNA 

-11.9875 

-76.8422 

Nana,  Peru 

NWAO# 

117.2333 

Narrogin,  Australia 

OBN 

55.1167 

36.6000 

Obninsk,  Russia 

OGS 

27.0500 

142.2000 

Ogasawara,  Japan 

PFO 

33.6092 

-116.4550 

Pinon  Flat,  California 

PMG# 

-9.4092 

147.1539 

Port  Moresby,  New  Guinea 

PTGA# 

-59.9666 

Pitinga,  Brazil 

RAR# 

-21.2125 

-159.7733 

Rarotonga,  Cook  Islands 

RPN 

-27.1267 

-109.3344 

Rapanui,  Easter  Island 

SADO 

44.7694 

-79.1417 

Sadowa,  Canada 

SDV# 

8.8790 

-70.6330 

Santo  Domingo,  Venezuela 

SFJ# 

66.9967 

-50.6152 

Sondre  StromQord,  Greenland 

SHK 

34.5300 

132.6800 

Shiraki,  Japan 

SNZO# 

-41.3103 

174.7046 

South  Karori,  New  Zealand 

SPITS 

78.1777 

16.3700 

Spitsbergen  Array,  Norway 

SUR 

-32.3797 

20.8117 

Sutherland,  South  Africa 

TKL 

35.6580 

Tuckaleechee  Caverns,  Tennessee 

TSK 

36.2108 

140.1097 

Tsukuba,  Japan 

TSUM# 

-19.2022 

17.5838 

Tsumeb,  Namibia 

ULN 

47.8652 

107.0528 

Ulaanbaatar,  Mongolia 

VRAC 

49.3083 

16,5935 

Vranov,  Czech  Republic 
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Figure  3^  Locations  of  Primary  and  Auxiliary  stations.  Green  squares:  at  least  20  regional  events  with  at 
least  one  discriminant  with  SNR  >  2.0;  Blue  triangles:  at  least  20  regional  events  with  at  least  one 
discriminant  with  SNR  >  1.5;  Red  circles:  less  than  20  regional  events. 

Often  an  event  produces  regional  measurements  at  more  than  one  station  and  there  has  been  an 
event  with  regional  measurements  taken  by  as  many  as  ten  stations.  Figure  6  plots  the  distribution 
of  events  as  a  function  of  the  number  of  stations  with  at  least  one  discriminant  measured  with 
SNR  >  1.5.  Also  plotted  is  the  distribution  where  at  least  one  discriminant  measured  with  SNR  > 
2.0.  Recall  that  a  regional  event  is  defined  to  be  within  20  degrees  (2220  km)  of  the  station.  As 
can  be  seen  in  Figure  6,  about  67%  of  the  regional  events  are  observed  by  only  one  station. 
Therefore,  about  one -third  of  the  regional  events  are  observed  by  more  than  one  station,  with 
about  20%  observed  at  two  stations.  (Note:  by  observed  we  mean  there  is  at  least  one  discriminant 
with  SNR  >  1,5  that  can  be  measured,  which  requires  Pn  in  one  of  the  2-A  Hz,  4-6  Hz,  or  6-8  Hz 
frequency  bands  with  SNR  >  1 .5  and  either  Lg  or  Sn  in  the  same  frequency  band  with  SNR  >  1 ,5), 

Figures  7  and  8  plot  the  distributions  of  events  as  a  function  of  the  number  of  stations,  but  for  each 
of  the  six  discriminants  individually.  Figure  7  includes  each  of  the  six  discriminants  on  the  same 
plot,  whereas  Figure  8  has  six  separate  plots,  one  for  each  discriminant.  These  plots  show  that 
each  of  the  six  discriminants  are  measured  with  approximately  the  same  frequency,  on  average. 
There  is  no  one  discriminant  that  is  measured  much  more  often,  on  average,  than  the  others. 

In  the  remainder  of  the  report  we  describe  how  these  data  are  distance-corrected  and  how  we 
categorize  each  station  in  terms  of  the  utility  of  their  training  sets  and  distance  corrections. 
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Number 


Figure  4. 


Number  of  regional  events  with  one  through  six  discriminants  measured. 


Figure  5,  Percentage  of  regional  events  with  one  through  six  discriminants  measured. 
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Figure  6.  Distribution  of  regional  events  as  a  function  of  the  number  of  observing  stations  per  event  for 
Primary  and  Auxiliary  stations  combined. 
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Figure  7, 


Distribution  of  regional  events  by  individual  discriminant  (combined). 
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Figure  8. 


Distribution  of  regional  events  by  individual  discriminant  (separate), 


4.  Distance  Corrections 


As  described  in  Section  2,  a  training  set  is  most  useful  when  it  is  composed  of  events  all  from  the 
same  distribution.  In  this  case  a  test  event  is  either  in  a  low  probability  region  of  the  test  space  and 
is  considered  an  outlier,  e.g.,  it  belongs  to  another  population  (such  as  explosions),  or  it  is  in  a 
high  probability  region  and  is  consider  a  sample  from  the  training  set  population  (earthquakes). 

Previous  studies  have  provided  convincing  evidence  that  if  events  at  substantially  different 
distances  are  to  be  combined  into  the  same  training  set,  distance  corrections  are  essential.  At 
station  WMQ  in  China,  Fisk  et  al.  (1996b)  found  that  distance  dependencies  are  large  enough  to 
shift  the  mean  of  the  regional  discriminants  for  the  earthquake  population  at  one  distance  to  the 
mean  of  the  explosion  population  at  another  distance.  Similar  phenomena  have  also  been 
discussed  by  Fisk  et  al.  (1994,  1995)  for  events  detected  at  ARCES. 

Further,  all  the  experience  of  which  we  are  aware  suggests  that  the  slope,  cx(/) ,  in  Eq.  (1),  will 
typically  be  positive  (e.g.,  Baumgardt  and  Der,  1994;  Fisk  et  al.,  1995,  1996b;  Sereno,  1990).  As 
we  shall  describe  in  detail  below,  a  straightforward  fitting  procedure  sometimes  produces  negative 
or  unusually  large  positive  values  when  applied  to  the  IMS  data  used  in  this  study.  We  are  not 
confident  in  the  validity  of  some  of  the  results  we  have  obtained.  Below  we  propose  an  interim 
procedure  which  we  apply  to  the  IMS  data  here,  but  we  shall  continue  studies  either  to  validate 
the  procedure  or,  as  we  believe  may  be  necessary,  propose  a  modified  procedure. 

It  is  possible  that  the  negative  values  of  (X(/)  we  have  obtained  are  correct;  we  know  of  no  theory 
which  precludes  that  possibility.  It  is  perhaps  more  likely  that  our  results  are  due  to  mixing  events 
which  cannot  be  mapped  onto  a  single  distribution  by  a  relation  of  the  form  given  by  Eq.  (1).  That 
would  be  the  case,  for  example,  if  there  is  a  pronounced  azimuth  dependence  to  the  distribution 
functions  for  the  regional  discriminants.  In  that  case,  the  data  samples  we  use  for  the  fit  are  really 
drawn  from  a  mixture  of  distributions  and  fitting  the  results  to  Eq.  (1)  could  produce  results  of  the 
type  we  have  observed  even  if,  along  any  given  azimuth,  Eq.  (1)  is  a  very  good  approximation 
with  a  positive  value  of  (X(/) ,  but  with  different  parameter  values  for  different  azimuths.  Similar 
considerations  would  apply  if  the  propagation  characteristics  change  when  the  signal  passes  from 
one  tectonic  subregion  to  another  along  the  same  azimuth. 

As  data  from  the  IMS  network  have  accumulated  and  we  have  come  to  better  understand  some 
aspects  of  the  data  structures  that  the  system  will  have  to  analyze,  we  have  become  more 
interested  in  the  possibility  of  choosing  subsets  of  events  seen  at  a  given  station  to  form  training 
sets  more  appropriate  for  testing  a  given  new  event,  than  using  the  entire  set  of  regional  events 
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seen  at  that  station  would  be.  One  possible  procedure  of  this  type  would  be  to  form  a  training  set 
from  events  whose  distance  from  the  station  is  not  very  different  from  that  of  the  event  to  be 
tested.  Some  of  the  results  we  shall  present  on  attempting  to  fit  distance  corrections  point  out  the 
desirability  of  such  a  procedure  if  one  can  be  developed.  We  present  some  preliminary  results 
from  these  studies  below.  It  seems  likely  that  many,  perhaps  a  large  majority  of  events,  can  be 
tested  using  localized  training  sets  for  which  distance  dependence  plays  an  insignificant  role. 

In  Figure  9,  plots  are  shown  for  each  discriminant  (Pn/Lg  and  Pn/Sn  in  the  2—4,  4-6,  and  6-8  Hz 
bands)  of  the  log  (base  10)  of  the  discriminant  value  versus  the  log  (base  10)  of  the  distance  (in 
km)  from  station  CMAR  for  each  event  in  the  raw  training  set.  In  the  plots,  the  size  of  the  cross¬ 
shaped  marker  is  proportional  to  the  SNR  of  that  measurement.  All  events  in  the  set  have 
magnitude  mb  >  3.5  to  prevent  potential  contamination  from  mining  blasts. 


Figure  9.  Log-log  scatterplots  of  discriminant  versus  distance  for  station  CMAR. 
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In  general,  smaller  values  of  each  discriminant  are  associated  with  smaller  distances  from  the 
station.  It  is  clear  that  for  a  fixed  distance  from  the  station  there  is  considerable  spread  in  the 
values.  On  average,  however,  it  is  apparent  that  the  mean  of  the  values  at  a  fixed  distance  is  an 
increasing  function  of  distance,  since  Lg  and  Sn  amplitudes  typically  attenuate  faster  with 
distance  than  corresponding  Pn  amplitudes.  If  the  relationships  of  log  Pn/Sn  or  log  Pn/Lg  with  log 
distance  is  approximately  linear,  then  the  two  parameters  describing  that  straight  line  can  be 
estimated  using  standard  statistical  regression  analysis.  Statistical  analysis  can  quantify  in  a 
certain  sense  how  good  a  line  fits  a  set  of  points,  but  only  physical  reasoning  and  visual  inspection 
can  determine  if  the  linear  fit  is  actually  appropriate.  It  could  be  the  case  that  no  one  function  of 
distance  describes  the  entire  set  of  points.  For  example,  the  set  of  points  could  be  composed  of 
samples  from  two  or  more  populations  of  events  at  various  distances,  possibly  along  two  different 
directions  for  which  the  relative  attenuation  of  Pn  and  Lg  (and  Sn)  are  different.  This  would,  in 
principle,  yield  plots  with  points  that  should  be  separated  into  two  or  more  sets,  with  a  possibly 
different  function  describing  the  means  as  functions  of  distance.  In  our  case  we  will  determine 
whether  a  straight  line  fit  is  a  good  assumption  by  examining  the  scatterplots  for  each  station. 

For  each  of  the  Pn/Lg  plots  in  Figure  9  in  the  three  frequency  bands,  2—4,  4—6,  and  6—8  Hz,  we 
now  fit  a  straight  line  of  the  form 

log(Pn/Lg(/))  =  a(/)  log(A/Ao)  +  log(p(/)) ,  (3) 

which  is  the  logarithm  of  Eq.  (1).  We  also  fit  the  three  Pn/Sn  plots  with  a  similar  equation.  Here, 
A  is  the  epicentral  distance  and  the  constant  Aq  is  a  reference  distance,  relative  to  which  the 
discriminant  values  are  corrected.  We  arbitrarily  set  Aq  at  1500  km,  roughly  the  mean  epicentral 
distance  of  the  2797  regional  events  considered  in  this  study.  The  unknown  coefficients,  «(/) 
and  P(/) ,  are  estimated  for  each  frequency  band  using  weighted  least  squares,  where  the  weights 
are  chosen  to  be  proportional  to  the  signal-to-noise  ratio.  Thus,  events  with  high  SNR  have  more 
influence  on  the  linear  fits  than  ones  with  poor  SNR. 

Figure  10  shows  the  same  scatterplots  as  Figure  9  with  the  lines  of  best-fit  superimposed.  The 
estimated  slopes  of  each  of  these  lines  are  significant  at  the  0.01  level,  using  a  test  based  on 
Student’s  t-statistic,  which  assumes  that  the  errors  are  normally  distributed.  In  other  words,  if  the 
true  slope  were  actually  zero,  and  if  the  errors  are  normally  distributed,  then  only  1%  of  the  time 
would  an  estimated  slope  be  randomly  greater  than  a  threshold  based  on  the  t-distribution.  The 

data  can  now  be  corrected  by  applying  Pn/Lg(/)co^rected  =  ^o)  ”^'^^P^/'^g(/)uncorrected 

to  the  discriminant  values,  where  a(/)  is  the  estimate  of  a(/).  The  intercept,  p(/),  is  a 
constant  for  each  discriminant,  which  does  not  have  an  effect  on  discrimination. 
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CMAR:  Pn/Lg(2-4  Hz) 


Figure  10.  Log-log  scatterplots  of  discriminant  vs.  distance  with  lines  of  best-fit  for  station  CMAR. 


Each  slope  in  Figure  10  is  a  positive  number  between  approximately  1  and  2.  This  corresponds  to 
a  power-law  relationship  between  distance  and  Pn/Lg  or  Pn/Sn  that  falls  between  linear  and 
quadratic  (see  Eq.  (1)).  The  positive  slope  indicates  that  Lg  and  Sn  are  falling  off  with  a  higher 
power  of  distance  than  Pn. 

When  all  six  discriminants  have  this  expected  behavior,  positive  slopes  not  greater  than  3  for  all 
six  discriminants,  and  there  are  more  than  20  events  in  the  training  set,  we  will  refer  to  such  a 
station  as  belonging  to  Category  1 .  (This  and  other  categories  will  be  defined  more  precisely  in 
Section  6.)  There  are  a  number  of  stations  in  the  IMS  network  which  have  the  property  that  all  six 
discriminants  have  positive  slopes  less  than  3,  including  nine  Primary  stations:  CMAR,  CPUP, 
ESDC,  GERES,  ILAR,  KSAR,  MJAR,  NORES  and  PDY.  Plots  for  GERES,  KSAR,  MJAR  and 
NORES  are  shown  in  Figures  1 1-14,  respectively. 
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MJAR:  Pn/Lg(2-4  Hz) 


MJAR:  Pn/Sn(2-4  Hz) 


MJAR:  Pn/Lg(4-6  Hz) 


MJAR:  Pn/Sn(4-6  Hz) 


MJAR:  Pn/Lg(6-8  Hz) 


MJAR;  Pn/Sn(6 
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Figure  13.  Log-log  scatterplots  of  discriminant  vs.  distance  with  lines  of  best-fit  for  station  MJAR. 


NORES:  Pn/Lg(2-4  Hz) 


MORES:  Pn/Sn(2--4  Hz) 
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Figure  14.  Log-log  scatterplots  of  discriminant  vs.  distance  with  lines  of  best-fit  for  station  NORES. 


18 


There  are,  however,  many  stations,  even  those  with  sufficient  data,  which  have  unexpected  (or 
anomalous)  slopes,  mostly  because  one  or  more  of  the  discriminant  versus  distance  plots  yield 
negative  slopes.  One  such  Primary  station  is  TXAR.  Discriminant  versus  distance  plots  for  TXAR 
are  shown  in  Figure  15.  In  this  case,  the  slopes  for  Pn/Sn  (2-4  Hz)  and  Pn/Sn  (4-6  Hz)  are  slightly 
negative,  although  there  are  too  few  data  points  available  for  these  discriminants,  and  over  too 
limited  a  distance  range,  to  reliably  estimate  these  slopes.  Additional  data  must  be  collected  for 
TXAR  and  other  stations  with  similar  data  limitations,  before  reliable  distance  corrections  can  be 
estimated  and  utilized. 


TXAR:  Pn/Lg(2-4  Hz) 


TXAR:  Pn/Lg(4-6  Hz) 


TXAR:  Pn/Lg(6-8  Hz) 


Figure  15.  Log-log  scatterplots  of  discriminant  vs.  dl 


TXAR:  Pn/Sn(2-4  Hz) 


TXAR:  Pn/Sn(4-6  Hz) 


TXAR:  Pn/Sn(6-8  Hz) 


with  lines  of  best-fit  for  station  TXAR. 


A  more  convincing  example  of  a  station  with  negative  slopes  for  discriminant  versus  distance 
plots  is  LPAZ  (Figure  16).  Pn/Lg  and  Pn/Sn  in  the  4—6  Hz  and  6-8  Hz  bands  all  have  negative 
slopes,  with  the  estimated  slope  for  Pn/Lg  (4-6  Hz)  significant,  using  the  t-statistic  test,  at  0.01 
significance  level.  Clearly  more  data  and  investigation  of  subregional  variations  are  needed. 
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LPAZ:  Pn/Lg(2-4  Hz) 
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LPAZ:  Pn/Lg(6-8  Hz) 


LPAZ;  Pn/Sn(6-8  Hz) 
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Figure  16.  Log-log  scatterplots  of  discriminant  vs.  distance  with  lines  of  best-fit  for  station  LPAZ. 
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Another  set  of  stations  which  cannot  be  categorized  as  Category  1  (cf.  Section  6)  are  those 
stations  located  in  Australia,  ASAR  and  WRA.  As  plotted  in  Figures  17  and  18,  both  stations 
show  a  number  of  discriminants  with  highly  significant  slopes  greater  than  three.  An  important 
feature  to  notice  about  both  of  these  stations,  however,  is  that  all  of  the  regional  events  above  mb 
3.5  fall  within  a  relatively  narrow  distance  range  from  the  station.  The  reason  for  this  is  that 
almost  all  of  the  events  above  mb  3.5  that  are  observed  at  the  Australian  stations  occur  in  the 
tectonically-active  region  near  Indonesia.  Since  any  test  event  above  mb  3.5  observed  at  these 
stations  will  almost  certainly  come  from  this  same  active  region,  it  is  likely  that  distance 
corrections  will  not  be  needed,  because  the  test  event  will  be  at  roughly  the  same  distance  as  all  of 
the  events  in  the  training  set.  Stations  which  have  this  feature  are  more  likely  to  have  suspicious 
distance  corrections  than  stations  having  events  with  a  wide  spread  in  regional  distances,  but 
effective  application  of  the  outlier  analysis  can  usually  still  be  made.  Stations  such  as  ASAR  and 
WRA  will  be  termed  Category  2,  to  be  defined  more  precisely  below. 
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Finally,  we  examine  stations  that,  while  some  of  the  discriminant  versus  distances  plots  have 
negative  slopes,  if  events  with  mb  <3.5  are  included,  the  slopes  become  positive.  Examples  of 
such  stations  are  ARCES,  LOR  and  FINES.  Notice  in  Figure  19  that  each  of  the  three  Pn/Lg 
discriminants  for  the  training  set  at  ARCES  have  negative  slopes.  Figure  20  plots  the  same  six 
discriminants  for  all  events  observed  over  the  same  time  period  at  ARCES,  including  those  with 
mb  <3.5  and  no  mb  measurements.  It  is  clear  that  there  are  many  more  events  at  ARCES  with  mb 
<  3.5  or  no  mb  than  events  above  mb  3.5.  Although  there  is  no  way  to  tell  from  magnitude  alone 
how  many  of  the  events  below  mb  3.5  are  mining  blasts,  it  may  be  the  case  that  most  of  them  are 
mining  blasts.  Nevertheless,  as  can  be  seen  in  Figure  20,  all  of  the  estimated  slopes  are  now 
positive.  Without  further  in-depth  study,  it  is  not  known  whether  Figure  20  contains  mostly 
mining  blasts  with  a  sufficiently  different  character  than  the  earthquakes  in  Figure  19  to  produce 
the  positive  slopes.  It  is  also  not  clear  if  the  change  in  slope  is  due  to  simply  having  many  more 
data  samples.  Figures  21  and  22  show  a  similar  change  in  slopes  for  station  LOR. 

Although  the  distance  corrections  which  are  obtained  in  Figures  20  and  22  seem  to  have  the 
expected  behavior  in  their  slopes,  there  is  sufficient  reason  to  be  suspicious  that  these  are  not  the 
true  distance  corrections  for  training  sets  restricted  to  mb  >  3.5.  Note  that  if  mining  blasts  and 
earthquakes  are  typically  at  different  distances,  inclusion  of  both  can  significantly  and  adversely 
affect  the  estimated  slopes.  These  suspicions  are  great  enough  to  not  place  stations  such  as 
ARCES  and  LOR  in  Category  1.  Instead  they  will  be  put  in  Category  3,  to  be  defined  more 
precisely  below,  the  category  for  stations  with  contradictory  slopes. 

There  are  a  number  of  possibilities  as  to  why  some  regional  discriminants  have  distance 
dependence  of  an  unexpected  character.  This  problem  is  not  well  understood  theoretically  and 
there  is  evidence  that  the  form  of  dependence  assumed  here  may  not  always  be  justified  (e.g., 
Kennett,  1991,  1992).  It  may  be  the  case  that  the  unusual  estimated  slopes  are  accurate,  the 
evidence  of  which  would  be  strengthened  by  the  future  availability  of  more  data.  Another 
possibility  is  that  the  discriminant  ratios  depend  on,  in  addition  to  distance,  other  variables,  such 
as  direction  from  the  station  (i.e.,  azimuth)  and  the  geophysical  properties  of  the  propagation  path. 
Further  detailed  study  of  the  data  is  needed  to  determine  if  this  is  the  case  and  what  the  necessary 
corrections  should  be.  For  now,  using  only  uniform  distance  corrections,  we  will  consider  with 
suspicion  those  discriminants  with  negative  or  abnormally  steep  slopes. 

Before  precisely  defining  the  station  categories  and  assigning  each  Primary  and  Auxiliary  station 
to  a  particular  category,  we  will  discuss  the  distributions  of  the  regional  discriminants  at  each 
station  and  outlier  removal,  whereby  anomalous  events  are  removed  from  the  regional  data  sets 
and  the  distance  corrections  are  iteratively  re-computed. 
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Figure  19. 


Figure  20. 


Log-log  scatterplots  of  discriminant  vs.  distance  with  lines  of  best-fit  for  station  ARCES. 
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Discriminant  vs.  distance  with  lines  of  best-fit,  including  events  with  mb  <  3.5,  for  ARCES. 
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5.  Distributions  and  Outlier  Removal 


If  a  suitable  set  of  distance  corrections  for  all  six  discriminants  has  been  estimated  for  a  training 
set  and  applied  to  all  events,  then  each  corrected  event  in  the  training  set  will,  in  principle,  be  a 
sample  from  the  same  population.  A  test  event,  distance-corrected  in  the  same  manner,  can  then 
be  analyzed  with  respect  to  this  corrected  training  set  and  judged  to  be  an  outlier,  at  a  specified 
significance  level,  if  the  value  of  the  generalized  likelihood  ratio  for  the  test  event  is  less  than  the 
threshold  value  computed  from  the  corrected  training  set  values.  The  threshold  value  is  typically 
set  so  that  the  probability  is  0.01  that  the  likelihood  ratio  for  the  test  event  is  less  than  the 
threshold,  if  it  is  true  that  the  test  event  came  from  the  same  population  as  the  samples  in  the 
training  set.  For  each  value  of  the  likelihood  ratio,  X ,  we  can  define  a  number,  P ,  referred  to  as 
the  P-value,  such  that  is  the  probability  that  the  likelihood  ratio  is  between  0  and  X  (the 
likelihood  ratio  must  be  positive).  The  threshold  value,  X^,  with  a  =  0.01 ,  then  has  a  P-value, 
p.  =  oc  =  0.01 .  An  outlier  would  then  have  a  P-value  less  than  a  =  0.01 .  In  a  previous  report, 
Fisk  et  al.  (1995)  categorized  regional  events  using  the  P-value  as  a  scoring  metric. 

To  calculate  the  generalized  likelihood  ratio,  and,  hence,  the  P-value,  for  a  given  test  event,  it  is 
necessary  to  know,  or  to  assume,  the  form  of  the  probability  distribution  function  for  the  training 
set  (see  Fisk  et  al.,  1993).  Although  not  necessary,  the  calculations  become  much  more  tractable  if 
the  distributions  can  be  assumed  to  be  normal.  In  fact,  if  each  discriminant  can  be  assumed  to  be 
normally  distributed  and  there  are  no  missing  data  values,  then  an  analytic  calculation  of  the 
generalized  likelihood  ratio  Ccin  be  performed  (e.g.,  Fisk  et  al.,  1996b).  If  the  training  sets  (after 
distance  correction)  are  unlikely  to  be  samples  from  a  normal  distribution  then  one  of  two 
approaches  can  be  taken:  (1)  the  distribution  from  which  the  training  samples  come  can  be 
determined  and  the  appropriate  likelihood  ratio  computed,  or,  (2)  a  transformation  can  be  found 
which  transforms  the  training  set  to  normal  and  the  likelihood  ratio  computed  under  the 
assumption  of  normality.  The  first  approach  is  much  more  difficult  to  implement  than  the  second, 
so  we  will  employ  the  second  option. 

5.1.  Testing  for  Normality 

To  test  each  of  the  six  discriminants  for  normality  we  apply  three  separate  tests,  the  Anderson- 
Darling  (A2)  test  (Anderson  and  Darling,  1954),  the  Wilk-Shapiro  (W)  test  (Shapiro  and  Wilk, 
1965),  and  Lin  and  Mudholkar  test  (Lin  and  Mudholkar,  1980).  If  normality  is  rejected,  at  the 
0.01  significance  level,  for  any  of  the  three  tests,  then  that  discriminant  is  rejected  as  normal  and  a 
transformation  is  applied.  If,  after  the  transformation  is  applied,  the  discriminant  is  still  rejected 
as  normal  by  any  of  the  three  tests,  then  that  discriminant  is  not  used  in  the  outlier  test. 
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A  class  of  transformations  which  has  proved  useful  in  transforming  univariate  data  to  normal  are 
the  Box-Cox  transformations  (Box  and  Cox,  1964),  of  which  the  logarithm  and  the  square  root  are 
special  cases.  For  positive  data  (any  finite  set  of  data,  including  the  discriminants  for  the  test 
event,  can  be  shifted  to  all  positive  values  without  affecting  the  outlier  test)  the  Box-Cox 
transformation  is  given  by 

Y 

x(y)  =  L-zl.  (4) 

Y 

This  set  of  transformations  is  continuous  in  the  limit  y  0  gives  =  logx .  For  each  set 
of  discriminant  data,  y  can  be  chosen  to  maximize  the  likelihood  that  the  transformed  set  are 
samples  from  a  normal  distribution.  An  issue  with  performing  the  best  transformation  for  each 
discriminant  separately,  is  that  the  covariance  structure  of  the  original  data  is  transformed  in  a 
complicated  way.  Choosing  the  same  y  to  transform  each  of  the  six  discriminants  keeps  the 
covariance  structure  intact,  although  some  of  the  discriminants  may  rejected  as  normal.  We  have 
found  that  a  simpler  approach  of  taking  the  log  of  all  the  data  (the  same  Box-Cox  transformation, 
y  =  0 )  transforms  nearly  all  discriminant  sets  to  normal  at  0.01  significance  level. 

Recall  that  the  distance  corrections  were  estimated  after  taking  the  logarithm  of  the  data,  assuming 
the  distance  dependence  is  linear  on  the  log-log  plot,  and  that  the  tests  of  significance  of  the  best- 
fit  lines  assumed  the  errors  were  normally  distributed,  after  taking  the  log.  Thus,  taking  the 
logarithm  here  provides  some  consistency  in  the  linked  procedures  of  distance  correction  and 
outlier  removal  (c.f.  Section  5.2).  Note  also  that  there  is  no  reason  to  think  that  the  ratio  of  two 
positive  numbers,  such  as  those  for  the  regional  amplitude  ratios,  should  be  normally  distributed 
and  it  is  known  that  the  logarithm  transformation  typically  tends  to  make  data  more  normal. 

Figures  23-25  show  histograms  of  the  six  discriminants  for  the  CMAR  training  set  before  and 
after  applying  the  log  transformation.  All  six  sets  of  discriminants  were  rejected  as  normal  at  the 
0.01  significance  level  before  the  log  transformation  was  taken,  while  after  the  logarithm  was 
applied  all  six  discriminants  were  accepted  as  normal.  For  CMAR,  then,  it  was  necessary  to 
transform  all  the  data  to  be  able  to  apply  the  outlier  test  in  its  normal  form.  Another  way  of 
visualizing  the  data  to  check  for  normality  is  the  quantile-quantile  (Q-Q)  plot.  Figures  26-28  show 
Q-Q  plots  for  the  six  discriminants  for  WRA  training  set.  If  a  data  set  is  approximately  normal, 
the  values  of  the  sample  quantiles  computed  from  the  data  (square  markers)  should  correspond  to 
the  quantiles  of  a  normal  distribution  represented  by  the  straight  line.  As  with  CMAR,  each  of  the 
six  discriminant  sets  for  WRA  are  rejected  as  normal  before  the  log  transformation  is  made,  while 
all  six  data  sets  are  accepted  as  normal,  at  the  0.01  significance  level,  after. 
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Histograms:  CMAR  Training  Set 
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Figure  23.  Histograms  for  CMAR  before  and  after  log  transformation  for  Pn/Lg  and  Pn/Sn  (2-4  Hz). 
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Figure  24.  Histograms  for  CMAR  before  and  after  log  transformation  for  Pn/Lg  and  Pn/Sn  (4-6  Hz), 
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Figure  25.  Histograms  for  CMAR  before  and  after  log  transformation  for  Pn/Lg  and  Pn/Sn  (6-8  Hz). 
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Figure  26.  Q-Q  plots  for  WRA  before  and  after  log  transformation  for  Pn/Lg  and  Pn/Sn  (2-4  Hz). 
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Figure  27. 


Figure  28. 


Q-Q  Plots:  WRA  Training  Set 
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Q-Q  plots  for  WRA  before  and  after  log  transformation  for  Pn/Lg  and  Pn/Sn  (4-6  Hz). 
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Q-Q  plots  for  WRA  before  and  after  log  transformation  for  Pn/Lg  and  Pn/Sn  (6-8  Hz). 
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For  each  of  the  50  Primary  and  Auxiliary  stations  with  at  least  10  events  in  the  training  set,  we 
tested  each  discriminant  data  set  for  normality  at  the  0.01  significance  level  both  before  and  after 
the  log  trcinsformation.  The  results  are  summarized  in  Tables  3  and  4.  For  comparison,  results  for 
each  discriminant  using  the  best  Box-Cox  transformation  are  shown  in  Table  5.  Before  applying 
the  transformation  only  about  40%  of  the  discriminant  sets  are  accepted  as  normal,  whereas  after 
applying  the  log  transformation  well  over  90%  of  the  discriminant  sets  are  accepted  as  normal. 
For  comparison,  if  the  best  Box-Cox  transformation  is  applied  to  each  discriminant  individually 
then  almost  99%  are  accepted  as  normal.  It  should  be  remarked  that  it  is  better  to  not  make  the  log 
transformation  if  the  original  data  are  accepted  as  normal  because  study  has  shown  that  the  outlier 
test  has  greater  power  for  detecting  explosions  using  Pn/Sn  and  Pn/Lg  than  using  the  logarithm  of 
these  quantities. 


Table  3.  Normality  test  results  for  discriminants  with  no  transformation. 


#  Accepted 

#  Rejected 

#  Insufficient 

Pn/Lg  (2^  Hz) 

21 

28 

1 

Pn/Lg  (4-6  Hz) 

24 

24 

2 

Pn/Lg  (6-8  Hz) 

18 

29 

3 

Pn/Sn  {2-A  Hz) 

19 

29 

3 

Pn/Sn  (4-6  Hz) 

22 

25 

3 

Pn/Sn  (6-8  Hz) 

18 

26 

6 

Total 

122 

161 

17 

Table  4.  Normality  test  results  for  discriminants  with  log  transformation. 


#  Accepted 

#  Rejected 

#  Insufficient 

Pn/Lg  (2-4  Hz) 

47 

2 

1 

Pn/Lg  (4-6  Hz) 

43 

5 

2 

Pn/Lg  (6-8  Hz) 

44 

3 

3 

Pn/Sn  (2-4  Hz) 

47 

1 

2 

Pn/Sn  (4^  Hz) 

44 

3 

3 

Pn/Sn  (6-8  Hz) 

39 

5 

6 

Total 

264 

19 

17 

30 


Table  5.  Normality  test  results  for  discriminants  with  Box-Cox  transformation. 


#  Accepted 

#  Rejected 

#  Insufficient 

Pn/Lg  (2-4  Hz) 

49 

0 

1 

Pn/Lg  (4-6  Hz) 

41 

1 

2 

Pn/Lg  (6-8  Hz) 

47 

0 

3 

Pn/Sn  {2-A  Hz) 

48 

0 

2 

Pn/Sn  (4-6  Hz) 

47 

0 

3 

Pn/Sn  (6-8  Hz) 

42 

2 

6 

Total 

280 

3 

17 

5.2.  Iterative  Procedure  to  Remove  Outliers 

Once  distance  corrections  and  transformations  to  normality  have  been  made,  each  training  set,  in 
principle,  consists  of  samples  from  the  same  normal  earthquake  population,  with  the  exception  of, 
at  most,  a  small  number  of  events.  These  may  be  events  of  other  types,  such  as  explosions  or  large 
mining  blasts,  or  earthquakes  for  which  there  is  some  type  of  anomalous  measurement.  If  we  test 
for  outliers  at  0.01  significance  level  then,  on  average,  one  legitimate  event  in  a  training  set  of  size 
100  will  be  considered  an  outlier.  It  is  not  clear  that  such  events  should  be  removed  from  the 
training  set,  unless  there  is  convincing  evidence  that  the  anomalous  events  in  question  truly  do  not 
belong  to  their  respective  regional  populations.  However,  for  the  purpose  of  refining  the  distance 
corrections,  we  do  remove  all  events  with  P-values  less  than  0.005  (from  sets  of  size,  e.g.,  100), 
since  outliers  can  potentially  alter  the  estimates  of  the  distance  corrections  in  an  adverse  way. 

To  remove  outliers  from  a  data  set  we  use  the  leave-one-out  procedure.  That  is,  we  extract  each 
event  from  the  training  set,  one  by  one,  and  test  it  as  an  outlier  relative  to  the  remaining  events  in 
the  set.  If  the  P-value  is  less  than  0.005,  that  event  is  removed.  After  removing  all  the  outliers  in 
this  manner,  the  training  set  will  be  somewhat  different  than  before  removal  and  may  yield  a 
different  set  of  distance  corrections.  In  this  case  we  use  an  iterative  procedure.  With  the  remaining 
elements  in  the  training  set,  a  new  set  of  straight  lines  are  fitted  to  the  discriminant  data  as  in 
Section  3.  The  individual  discriminants  are  tested  for  normality  and  transformed  if  normality  is 
rejected.  Once  the  new  distance  corrections  and  log  transformation,  if  necessary,  have  been 
applied,  outliers  are  once  again  removed  using  the  leave-one-out  procedure.  We  continue  with  the 
iteration  until  there  are  no  events  remaining  in  the  set  with  P-values  less  than  0.005.  Typically  this 
procedure  takes  one  or  two  iterations  when  the  log  transformation  is  used.  In  the  next  section  on 
station  categorization  we  present  the  number  of  outliers  removed  from  each  of  the  data  sets  for 
each  Primary  and  Auxiliary  station. 
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6.  Station  Categorization 

We  now  define  four  categories  that  describe,  in  a  relatively  objective  manner,  the  operational 
status  of  the  initial  training  sets  for  each  Primary  and  Auxiliary  station  in  the  existing  IMS 
network,  for  use  in  experimental  evaluation  of  event  characterization  capabilities  at  the  PIDC.  The 
categories  are  in  decreasing  order  of  operational  utility  for  the  regional  population  analysis,  with 
Category  1  the  most  useful  and  Category  4  the  least  useful.  Only  Category  1  and  Category  2 
training  sets  are  currently  used  in  the  regional  population  analysis  being  performed  on  a  routine 
basis  at  the  PIDC.  (Other  criteria,  such  as  a  measure  of  the  residuals  to  the  distance  corrections, 
could  also  be  used  in  quantifying  the  status  of  the  training  sets.) 

Category  1: 

1.  There  are  20  or  more  events  with  at  least  one  discriminant  with  SNR  >  1.5. 

2.  The  ratio  of  the  distances  for  the  farthest  and  closest  events  is  >  2. 

3.  All  six  estimated  slopes  are  >  0  and  <  3. 

Category  2: 

1.  There  are  20  or  more  events  with  at  least  one  discriminant  with  SNR  >1.5. 

2.  The  ratio  of  the  distances  from  the  farthest  event  to  the  closest  event  is  <  2. 

3.  Estimated  slopes  can  have  any  value. 

Category  3: 

1.  There  are  20  or  more  events  with  at  least  one  discriminant  with  SNR  >  1.5. 

2.  The  ratio  of  the  distances  for  the  farthest  and  closest  events  is  >  2. 

3.  At  least  one  estimated  slope  is  <  0  or  >  3. 

Category  4: 

1.  There  are  less  than  20  events  with  at  least  one  discriminant  with  SNR  >  1.5. 


Clearly  as  more  data  becomes  available  there  will  be  fewer  stations  in  Category  4.  Category  1 
stations  were  described  in  detail  in  Section  3.  Category  2  stations  have  all  events  in  the  training  set 
closely  spaced.  We  use  as  the  definition  of  close,  Amax/Amin  <  2,  where  Amax  is  the  distance  of 
the  event  farthest  from  the  station  and  Amin  is  the  distance  of  the  event  closest  to  the  station.  This 
definition  is  equivalent  to  the  difference  in  the  farthest  event  and  the  closest  on  the  log  plot  being 
equal  to  the  constant  logjo  2  =  0.3.  If  most  events  are  close  to  the  same  distance,  then  distance 
corrections  are  not  nearly  as  important  since  all  events  will  be  corrected  by  roughly  the  same 
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amount,  whether  accurate  or  not.  All  events  in  Category  3  have  at  least  one  slope  of  a  suspicious 
character  requiring  further,  more  detailed,  study  to  determine  if  simple  distance  corrections  are 
adequate  to  correct  discriminant  values  for  events  at  different  locations. 

Tables  6  and  7  list  each  Primary  and  Auxiliary  station,  respectively,  in  Category  1.  Similarly 
Tables  8  and  9  list  the  stations  in  Category  2,  Tables  10  and  1 1  list  the  stations  in  Category  3,  and 
Tables  12  and  13  list  the  stations  in  Category  4. 


Table  6.  Category  1  Primary  Stations. 


Station 

Category 

#  Events 
(SNR  >1.5) 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

CMAR 

1 

87 

47 

246 

2220 

0 

CPUP 

1 

30 

11 

724 

2222 

1 

ESDC 

1 

35 

15 

273 

2218 

2 

GERES 

1 

161 

82 

323 

2221 

8 

ILAR 

1 

85 

75 

163 

2201 

3 

KSAR 

1 

141 

89 

159 

2215 

0 

MJAR 

1 

175 

74 

138 

10 

NORES 

1 

38 

14 

435 

2167 

1 

PDY 

1 

24 

19 

372 

2163 

0 

Table  7.  Category  1  Auxiliary  Stations. 


station 

Category 

#  Events 
(SNR  >1.5) 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

API 

I 

25 

17 

144 

1572 

0 

BBB 

1 

27 

21 

1981 

0 

DAVOS 

1 

95 

55 

235 

2179 

1 

HFS 

1 

31 

10 

569 

2147 

1 

HNR 

1 

45 

27 

139 

2212 

0 

INK 

1 

59 

46 

368 

2211 

1 

KKJ 

I 

80 

35 

278 

2199 

3 

OGS 

1 

54 

40 

147 

2181 

3 

SPITS 

1 

40 

32 

180 

2158 

0 

VRAC 

1 

39 

24 

265 

2107 

4 

33 
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Table  8.  Category  2  Primary  Stations. 


Station 

Category 

#  Events 
(SNR  >1.5) 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

ASAR 

2 

137 

58 

1174 

2220 

1 

WRA 

2 

263 

163 

1265 

2213 

13 

Table  9.  Category  2  Auxiliary  Stations. 


Station 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

CTA 

2 

78 

22 

1137 

2221 

2 

MBC 

2 

27 

19 

1335 

2169 

0 

Table  10.  Category  3  Primary  Stations. 


station 

Category 

#  Events 
(SNR  >1.5) 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

ARCES 

3 

34 

24 

329 

2132 

1 

BIT 

3 

20 

3 

847 

2127 

0 

FINES 

3 

34 

13 

425 

2207 

1 

LOR 

3 

60 

40 

461 

1958 

7 

LPAZ 

3 

39 

22 

353 

2216 

0 

PDAR 

3 

33 

10 

284 

2099 

3 

PLCA 

3 

22 

13 

392 

1859 

0 

TXAR 

3 

45 

29 

946 

2192 

0 

YKA 

3 

32 

24  ^ 

785 

2206 

0 

ZAL 

3 

53 

10 

496 

2217 

2 

34 


Table  11.  Category  3  Auxiliary  Stations. 


Station 

Category 

#  Events 
(SNR  >1.5) 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

ALQ 

3 

28 

8 

852 

2178 

0 

DLBC 

3 

54 

37 

617 

2212 

0 

EKA 

3 

56 

37 

800 

2216 

1 

JER 

3 

21 

18 

326 

2114 

0 

KVAR 

3 

30 

13 

256 

2217 

0 

NEW 

3 

21 

3 

768 

2182 

1 

NIL 

3 

45 

17  ^ 

270 

2139 

2 

PMG 

3 

50 

280 

2001 

0 

SHK 

3 

65 

44 

281 

2032 

3 

SNZO 

3 

30 

24 

167 

2134 

0 

Table  12.  Category  4  Primary  Stations. 


Station 

Category 

#  Events 
(SNR  >1.5) 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

ABKT 

4 

16 

7 

408 

2015 

2 

BDFB 

4 

0 

0 

N/A 

N/A 

N/A 

BGCA 

4 

2 

1 

1156 

2095 

0 

BOSA 

4 

14 

11 

236 

2106 

1 

DBIC 

4 

12 

4 

1198 

1733 

0 

HIA 

4 

0 

0 

N/A 

N/A 

N/A 

KBZ 

4 

9 

3 

292 

2005 

0 

MAW 

4 

0 

0 

N/A 

N/A 

N/A 

MNV 

4  ^ 

19 

12 

458 

1809 

0 

NRI 

4 

2 

1 

1772 

1827 

0 

SCHQ 

4 

4 

1 

1294 

2158 

0 

STKA 

4 

3 

0 

2036 

2220 

0 

ULM 

4 

6 

1 

913 

2182 

0 

VNDA 

4 

0 

0 

N/A 

N/A 

N/A 

35 


Table  13.  Category  4  Auxiliary  Stations. 


Station 

Category 

#  Events 
(SNR  >1.5) 

#  Events 
(SNR  >  2.0) 

Amin 

Amax 

#  Outliers 

AAE 

4 

0 

0 

N/A 

N/A 

N/A 

AQU 

4 

0 

0 

N/A 

N/A 

N/A 

ARU 

4 

4 

2 

1779 

2189 

0 

BORG 

4 

3 

1 

198 

1854 

0 

DAV 

4 

6 

2 

552 

1416 

0 

ELK 

4 

18 

11 

380 

1762 

0 

FITZ 

4 

15 

6 

1310 

2162 

0 

FRB 

4 

1 

0 

1297 

1297 

0 

ISG 

4 

18 

7 

67 

1321 

0 

JTS 

4 

0 

0 

N/A 

N/A 

N/A 

KIEV 

4 

12 

8 

438 

1828 

0 

LSZ 

4 

0 

0 

N/A 

N/A 

N/A 

MLR 

4 

0 

0 

N/A 

N/A 

N/A 

MSEY 

4 

0 

0 

N/A 

N/A 

N/A 

NNA 

4 

13 

10 

353 

2083 

0 

NWAO 

4 

3 

2 

1848 

1990 

0 

OBN 

4 

18 

5 

1602 

2192 

0 

PFO 

4 

17 

11 

283 

1594 

0 

PTGA 

4 

0 

0 

N/A 

N/A 

N/A 

RAR 

4 

12 

5 

1399 

2152 

0 

RPN 

4 

0 

0 

N/A 

N/A 

N/A 

SADO 

4 

2 

1 

929 

1439 

0 

SDV 

4 

0 

0 

N/A 

N/A 

N/A 

SFJ 

4 

1 

2028 

2028 

0 

SUR 

4 

3 

3 

711 

909 

0 

TKL 

4 

1 

1 

1889 

1889 

0 

TSUM 

4 

3 

2 

878 

1279 

0 

ULN 

4 

12 

10 

716 

2149 

0 

Each  table  contains  the  number  of  events  measured  with  at  least  one  discriminant  with  SNR  >  1.5, 
used  to  determine  the  category,  as  well  as  the  number  of  events  with  SNR  >  2.0,  for  future 
considerations.  In  addition,  each  table  lists  the  distance  of  the  event  farthest  from  the  station, 
Amax,  and  the  distance  of  the  event  closest  to  the  station,  Amin,  both  in  kilometers.  The  last 
column  contains  the  number  of  outliers,  determined  by  the  methods  of  Section  5,  that  have  been 
removed  from  the  training  set  for  that  station. 
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Figures  29  shows  each  Primary  station  located  on  a  map  of  the  world.  Category  1  stations  are 
depicted  by  green  squares.  Category  2  stations  are  depicted  by  yellow  triangles,  Category  3 
stations  are  depicted  by  orange  diamonds,  and  Category  4  stations  are  depicted  by  red  circles. 
Figure  30  is  a  similar  plot  for  Auxiliary  stations. 
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Figure  29.  Locations  of  Primary  seismic  stations:  Category  1  -  green  squares,  Category  2  -  yellow  triangles, 
Category  3  -  orange  diamonds.  Category  4  -  red  circles. 
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Figure  30.  Same  as  Figure  29,  but  for  the  Auxiliary  seismic  stations. 


37 


7.  Conclusions  and  Recommendations 


In  this  report  we  have  described  initial  efforts  to  establish  training  sets  of  regional  seismic  data  for 
use  in  the  regional  population  (i.e.,  outlier)  analysis.  Each  Primary  and  Auxiliary  seismic  station 
in  the  existing  IMS  network  has  been  categorized  in  a  manner  which  qualitatively  describes  the 
utility  of  these  initial  training  sets  for  the  outlier  analysis,  in  terms  of  the  number  of  events  with 
regional  amplitude  ratios  satisfying  an  SNR  criterion  and  with  reasonable  distance  corrections.  At 
this  time,  Category  1  and  Category  2  stations,  listed  in  Tables  6-9,  are  being  used  for  preliminary 
experimental  evaluation  of  event  characterization  capabilities  at  the  prototype  IDC. 

Category  3  stations,  not  recommended  for  use  at  this  time,  all  contain  at  least  one  discriminant 
with  an  unusual  dependence  on  distance  from  the  station.  Our  current  “prejudice”  is  that  the 
discriminant  ratios,  Pn/Lg  and  Pn/Sn  in  the  2—4  Hz,  4-6  Hz,  and  6-8  Hz  bands,  should  typically 
be  increasing  functions  of  distance  which  can  be  approximated  by  a  power-law  relation,  as  in 
Eq.  (1)  (straight  lines  in  log-log  plots),  with  exponent  positive  and  not  too  large  (less  than  three). 
We  are  currently  conducting  studies  to  decide  what  one  should  think  about  some  of  the 
unexpected  results  we  have  obtained  by  fitting  Eq.  (1)  to  the  IMS  data. 

It  may  be  that  the  accumulation  of  sufficient  data,  or  further  theoretical  understanding,  will  show 
that  some  of  the  unexpected  slopes  are  actually  what  should  be  expected.  In  such  cases,  some  of 
the  Category  3  training  sets  could  be  included  in  Category  1  without  the  restrictions  on  the  slopes. 

On  the  other  hand,  it  may  be  the  case  that  other  variables,  such  as  the  direction  from  the  station 
(i.e.,  azimuth),  radiation  pattern  effects  and  tectonic  characteristics  of  the  propagation  path, 
influence  the  statistical  distribution  of  regional  discriminants  and  must  be  taken  into  account.  A 
first  step  in  addressing  such  possibilities  is  a  detailed  study  of  the  dependence  of  the  discriminant 
values  on  these  variables,  in  conjunction  with  distance  from  the  station.  Hopefully,  such  a  study 
would  suggest  ways  to  correct  the  data  using  all  these  variables,  such  as  using  multivariate 
regression,  or  ways  to  break  regions  around  some  stations  into  tectonic  subregions.  This  will 
require  more  regional  data,  amounts  of  which  are  now  steadily  increasing  at  the  PIDC. 

For  example.  Figure  3 1  shows  a  tectonic  grid,  with  2  by  2  degree  resolution,  that  was  established 
by  Oli  Guudmundsson  and  provided  to  us  by  Sereno  and  Jenkins  of  SAIC.  Figure  32  shows  an 
enlarged  view  of  the  tectonic  grid  for  the  region  surrounding  Australia,  with  regional  events 
(including  those  below  mb  3.5)  detected  by  ASAR  and  WRA  depicted  by  the  circular  markers. 
Most  of  the  regional  events  depicted  in  Figure  32  occurred  in  the  tectonically-active  subregion 
near  Indonesia.  Figure  32  illustrates  how  a  region  surrounding  a  given  station  could  be  divided 
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into  tectonic  subregions  so  that  training  sets  for  the  regional  population  analysis  could  also  be 
split  into  corresponding  subregions,  whereby  a  new  event  is  compared  to  only  those  training 
events  that  occurred  in  the  same  subregion.  Jenkins  et  al.  (1996)  have  previously  considered  the 
use  of  this  grid  in  establishing  attenuation  corrections  for  distinct  tectonic  subregions. 

Regardless  of  how  the  studies  on  the  distance  corrections  turn  out,  we  are  interested  in  the 
possibility  that  valid  application  of  the  outlier  test  can  be  performed  by  using  a  training  set 
containing  only  those  events  which  are  close  in  relative  distance  as  the  test  event,  so  that  distance 
corrections  need  not  be  made.  In  the  current  method  of  analyzing  discriminants,  “close”  would 
mean  having  a  similar  distance  from  the  station.  A  better  definition  of  “close”  would  require  all 
training  events  to  be  within  a  certain  distance  of  the  test  event,  which  would  require  much  more 
data  so  as  to  have  enough  training  events  at  roughly  the  same  location.  In  either  case,  it  is  a  non¬ 
trivial  problem  to  quantify  the  notion  of  close.  Basically,  it  must  be  determined  if  it  is  better  to 
include  a  potential  training  event  at  a  given  distance  away  from  the  test  event  than  to  not  include  it 
in  the  set.  This,  of  course,  depends  on  how  much  different  the  distribution  of  events  at  the  further 
location  is  than  the  events  at  the  test  event  location.  If  this  were  known,  then  a  correction  could  be 
made;  however,  the  point  is  that,  in  these  cases,  the  dependence  on  distance  is  not  known  and  it  is 
often  difficult  to  approximate.  Thus,  the  problems  of  determining  the  dependence  of  discriminant 
distributions  on  location  and  what  is  meant  by  close  are  intimately  related. 

To  illustrate  the  feasibility  of  this  approach,  Figures  33  and  34  show  cumulative  histograms  of  the 
number  of  events  within  a  log  distance  range  from  each  station  of  log  1.5,  with  a  common  set  of 
discriminants  (Figure  33),  and  with  at  least  one  common  discriminant  (Figure  34).  Events  from  all 
of  the  stations  were  combined  in  generating  these  plots.  Figure  33  illustrates  that,  on  average, 
roughly  30%  of  the  regional  events  have  at  least  20  events  seen  at  the  same  station  within  the 
specified  distance  range  and  with  the  same  set  of  regional  discriminants.  Similarly,  Figure  34 
illustrates  that,  on  average,  roughly  70%  of  the  regional  events  have  at  least  20  events  seen  at  the 
same  station  within  the  specified  distance  range  and  with  at  least  one  common  regional 
discriminant.  As  more  regional  data  accumulate  over  time,  these  percentages  should  increase 
proportionately.  Thus,  this  suggests  that  after  a  sufficient  period  of  time,  e.g.,  three  or  more  years, 
there  may  be  a  sufficient  number  of  events  over  a  broad  enough  range  of  regional  distances  at 
many  IMS  stations  to  allow  direct  comparison  of  events  to  training  events  at  equivalent  distances 
to  the  same  station.  This  could  alleviate  the  need  apply  distance  corrections  in  many  cases. 

Each  of  the  studies  discussed  above  is  worthy  of  further  research  and  we  plan  to  pursue  each  in 
the  near  future.  In  addition,  we  plan  to  make  continued  iterative  improvements  to  the  existing 
regional  distance  corrections  and  training  sets  as  more  data  are  collected  at  the  PIDC. 
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Figure  3L  Tectonic  grid  ot  the  world,  with  2  by  2  degree  resolution,  established  by  OH  Guudmundssouc 
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Figure  32.  Enlarged  view  of  the  tectonic  grid  for  the  region  surrounding  Australia,  with  regional  events 
detected  by  ASAR  and  VVRA  depicted  by  the  circular  markers. 
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